Add multi GPU CI job for libcu++#9435
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
OverviewThis PR adds a 2-GPU CI job to Changes
|
| Layer / File(s) | Summary |
|---|---|
GPU configuration schema with name and runner fields ci/matrix.yaml |
GPU definitions now include name and runner fields alongside sm; new h100_2gpu model and updated rtxpro6000 runner labels are added. |
GPU validation and build-time job naming/runner generation build-workflow.py |
get_gpu validates that GPU definitions include required name, runner, and sm fields; generate_dispatch_job_name appends gpu["name"] to job display names; generate_dispatch_job_runner uses gpu["runner"] instead of gpu["id"]-latest-1. |
Multi-GPU Docker device selection at runtime action.yml |
GPU device selection logic now detects multi-GPU runner labels via regex and requests --gpus all for counts > 1, otherwise uses prior device=${NVIDIA_VISIBLE_DEVICES:-} behavior. |
Multi-GPU test matrix entry for h100_2gpu ci/matrix.yaml |
New pull_request matrix job entry for libcudacxx targeting h100_2gpu GPU configuration. |
Comment @coderabbitai help to get the list of available commands and usage tips.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
😬 CI Workflow Results🟥 Finished in 1h 46m: Pass: 99%/505 | Total: 3d 20h | Max: 1h 05m | Hits: 99%/641420See results here. |
We could use multi GPU CI jobs to test interactions of cccl-rt with multiple GPUs. This PR adds a 2 GPU job to matrix.yaml
Needed to update how the name to label translation works to support the 2 GPU runners